首页> 外文OA文献 >Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study

【2h】

Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study

机译：架构对内存中数据分析性能的影响：Apache Spark案例研究

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

While cluster computing frameworks are contin-uously evolving to provide real-time data analysis capabilities,Apache Spark has managed to be at the forefront of big data an-alytics for being a unified framework for both, batch and streamdata processing. However, recent studies on micro-architecturalcharacterization of in-memory data analytics are limited to onlybatch processing workloads. We compare micro-architectural per-formance of batch processing and stream processing workloadsin Apache Spark using hardware performance counters on a dualsocket server. In our evaluation experiments, we have found thatbatch processing are stream processing workloads have similarmicro-architectural characteristics are bounded by the latency offrequent data access to DRAM. For data accesses we have foundthat simultaneous multi-threading is effective in hiding the datalatencies. We have also observed that (i) data locality on NUMAnodes can improve the performance by 10% on average and(ii)disabling next-line L1-D prefetchers can reduce the executiontime by up-to 14% and (iii) multiple small executors can provideup-to 36% speedup over single large executor

机译：尽管集群计算框架不断发展以提供实时数据分析功能，但Apache Spark已成为批处理和流数据处理的统一框架，因此在大数据分析领域处于领先地位。但是，有关内存数据分析的微体系结构表征的最新研究仅限于批处理工作负载。我们使用双插槽服务器上的硬件性能计数器比较Apache Spark中批处理和流处理工作负载的微体系结构性能。在我们的评估实验中，我们发现批处理是具有相似的微体系结构特征的流处理工作负载，并且受对DRAM的延迟延迟数据访问的限制。对于数据访问，我们发现同时多线程可有效隐藏数据延迟。我们还观察到（i）NUMAnodes上的数据局部性可以平均将性能提高10％；（ii）禁用下一行L1-D预取器可以将执行时间减少多达14％；（iii）多个小型执行器可以比单个大型执行器提速达36％

著录项

作者
Awan, Ahsan Javed; Brorsson, Mats; Vlassov, Vladimir; Ayguade, Eduard;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. A Study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster [J] . Nowraj Farhan, Ahsan Habib, Arshad Ali International Journal of Information Technology and Computer Science . 2018,第7期

机译：Hadoop集群上Twitter数据上MapReduce和Apache Spark的研究和性能比较
2. Performance Evaluation of Apache Spark Vs MPI: A Practical Case Study on Twitter Sentiment Analysis [J] . Deepa S Kumar, M Abdul Rahman Journal of computer sciences . 2017,第12期

机译：Apache Spark与MPI的性能评估：Twitter情感分析的实际案例研究
3. Performance Evaluation of Apache Spark Vs MPI: A Practical Case Study on Twitter Sentiment Analysis [J] . Kumar Deepa S, Rahman M Abdul Journal of computer sciences . 2017,第12期

机译：Apache Spark与MPI的性能评估：Twitter情感分析的实际案例研究
4. Analyzing Performance of Apache Spark MLlib with Multinode Clusters on Azure HDInsight: Spark-Perf Case Study [C] . Sergii Minukhin, Natalia Brynza, Dmytro Sitnikov International Scientific Conference "Intellectual Systems of Decision-Making and Problems of Computational Intelligence" . 2021

机译：Apache Spark Mllib对Azure HDInsight上的MultiLode集群的性能分析：Spark-Perf案例研究
5. A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark's GraphX [D] . Langewisch, Ryan P. 2015

机译：在Apache Spark的GraphX中执行推入重贴标签最大流量算法的性能研究
6. Learning analytics: Survey data for measuring the impact of study satisfaction on students academic self-efficacy and performance [O] . Petros Kostagiolas, Charilaos Lavranos, Nikolaos Korfiatis 2019

机译：学习分析：调查数据用于衡量学习满意度对学生的学业自我效能和绩效的影响
7. The Impacts of Ethanol - Gasoline Blended Fuels on the Pollutant Emissions and Performance of a Spark - Ignition Engine : An Empirical Study [O] . Ümit Ağbulut, Suat Sarıdemir, Gökhan Durucan 2018

机译：乙醇 - 汽油混合燃料对火花点火发动机污染物排放和性能的影响：实证研究

Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study

摘要

著录项

相似文献

相关主题

期刊订阅